home *** CD-ROM | disk | FTP | other *** search
- 03-12-91
-
- This is V1.1 of REGEX Globber.
-
-
- 03-12-91
-
- I have made a few changes to the match module which do several
- things. The first change is an increase in bad pattern detection
- during a match. It was possible, in some very unlikely cases, to
- cook up a pattern which should result in an early bad match but which
- would actually cause problems for the parser. In particular, the
- subcase where the literal escape '\' within an open [..] construct at
- the end of a pattern would end up with incorrect results. I
- proceeded to create some of these patterns, added them to my test
- battery and dove straight in.
-
- In the interim I came across a posting to CompuServe (SMATCH by Stan
- Aderman) which attempted to create a completely non-recursive
- implementation of match (I am not sure this is possible without
- explicitly creating your own stack or it's equivelant, like a binary
- tree :-{ ). While the code could not correctly handle multiple '*'
- characters in the pattern, there was a few interesting ideas in the
- posting. On some occasions, running match over and over would be
- counter productive, especially and in particular when you have a bad
- pattern. I have added a fast routine, is_valid_pattern(), to
- determine if the current pattern is well formed which should address
- this situation.
-
- One other idea which I unceremoniously lifted from SMATCH was (in
- hindsight a pretty obvious feature) the return of a meaningful error
- code from both the pattern validity routine and from match() (which I
- renamed to matche()).
-
- I also took some time to experiment with some ways to cut some time
- off the routine. Since this is a SH pattern matcher whose intent is
- primarily for shell functions, the changes could not be algorithmic
- changes which relied on speedup over large input. The differences in
- execution time were not very significant, but I did manage to gain
- approximately 5%-10% speedup when I removed the literal escape ('\')
- parsing and pattern error checking. For those of you who want to use
- this for filename wildcard usage, I would recommend doing this since
- you should use is_valid_pattern and is_pattern before going out and
- finding filenames and the dos path delimiter defaults to the
- character used for the literal escape ('\') anyway (Note: I will be
- soon be releasing a *IX style file parser in the FINDFILE, FINDNEXT
- flavor soon to a Public Domain archive near you :-) ).
-
- I also briefly toyed with adding a non-SH regex character '+' to this
- module but removed it again. It was a performance hit of a few
- percent and would be mostly unused in any event. For those
- interested in such a feature, the changes are truly minimal. The
- required extra work is:
-
- 1) One case statement each in is_pattern() and is_valid_pattern()
- 2) One case statement in matche()
- 3) One addition to a while conditional in matche_after_star()
- 4) One addition to an if conditional in matche_after_star()
-
- Hint: The case statements are all "case '+'" and the conditionals
- have "|| *p == '+' " added to them.
-
- I have also included a file (MATCH.DOC) which describes matches use and
- background as well as a little about regular expressions.
-
- jbk
-
- 02-24-91
-
- This is V1.01 of REGEX Globber.
-
-
- 02-22-91 Seattle, WA
-
- Hmm. Choke. (Foot in mouth). After griping about buggy routines and
- literally seconds after posting this code the first time, I received
- a wonderful new test evaluation tool which allows you to perform
- coverage analysis during testing. Sure enough I found that about
- 25% of the paths in the program were never traversed in my current
- test battery. After swallowing my (overly large) pride and coming
- up with a test battery which covered the entire path of the program
- I found a couple of minor logic bugs involving literal escapes (\)
- within other patterns (ie [..] and * sequences). I have repackaged
- these routines and included also the makefile I use and the test
- battery I use to make things a bit easier.
-
- jbk
-
- 02-20-91 Seattle, WA
-
- Here is a *IX wildcard globber I butchered, hacked and cajoled together
- after seeing and hearing about and becoming disgusted with several similar
- routines which had one or more of the following attributes: slow, buggy,
- required large levels of recursion on matches, required grotesque levels
- of recursion on failing matches using '*', full of caveats about usability
- or copyrights.
-
- I submit this without copyright and with the clear understanding that
- this code may be used by anyone, for any reason, with any modifications
- and without any guarantees, warrantee or statements of usability of any
- sort.
-
- Having gotten those cow chips out of the way, these routines are fairly
- well tested and reasonably fast. I have made an effort to fail on all
- bad patterns and to quickly determine failing '*' patterns. This parser
- will also do quite a bit of the '*' matching via quick linear loops versus
- the standard blind recursive descent.
-
- This parser has been submitted to profilers at various stages of development
- and has come through quite well. If the last millisecond is important to
- you then some time can be shaved by using stack allocated variables in
- place of many of the pointer follows (which may be done fairly often) found
- in regex_match and regex_match_after_star (ie *p, *t).
-
- No attempt is made to provide general [pat,pat] comparisons. The specific
- subcases supplied by these routines is [pat,text] which is sufficient
- for the large majority of cases (should you care).
-
- Since regex_match may return one of three different values depending upon
- the pattern and text I have made a simple shell for convenience (match()).
- Also included is an is_pattern routine to quickly check a potential pattern
- for regex special characters. I even placed this all in a header file for
- you lazy folks!
-
- Having said all that, here is my own reinvention of the wheel. Please
- enjoy it's use and I hope it is of some help to those with need ....
-
-
- jbk
-
-